Techniques for Inverted Index Compression

نویسندگان

چکیده

The data structure at the core of large-scale search engines is inverted index, which essentially a collection sorted integer sequences called lists. Because many documents indexed by such and stringent performance requirements imposed heavy load queries, index stores billions integers that must be searched efficiently. In this scenario, compression essential because it leads to better exploitation computer memory hierarchy for faster query processing and, same time, allows reducing number storage machines. aim article twofold: first, surveying encoding algorithms suitable second, characterizing through experimentation.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inverted Index Compression

The data structure at the core of nowadays large-scale search engines, social networks and storage architectures is the inverted index, which can be regarded as being a collection of sorted integer sequences called inverted lists. Because of the many documents indexed by search engines and stringent performance requirements dictated by the heavy load of user queries, the inverted lists often st...

متن کامل

I Inverted Index Compression

The data structure at the core of nowadays largescale search engines, social networks, and storage architectures is the inverted index. Given a collection of documents, consider for each distinct term t appearing in the collection the integer sequence `t , listing in sorted order all the identifiers of the documents (docIDs in the following) in which the term appears. The sequence `t is called ...

متن کامل

On Inverted Index Compression for Search Engine Efficiency

Efficient access to the inverted index data structure is a key aspect for a search engine to achieve fast response times to users’ queries. While the performance of an information retrieval (IR) system can be enhanced through the compression of its posting lists, there is little recent work in the literature that thoroughly compares and analyses the performance of modern integer compression sch...

متن کامل

Cluster based Mixed Coding Schemes for Inverted File Index Compression

One way to improve inverted file compression is to use the cluster property [1] of document collection, which states that term occurrences are not uniformly distributed. Some terms are more frequently used in some parts of the collection than in others. The corresponding part of the inverted list will consequently be small d-gap values clustered. Interpolative code [9] exploits the cluster prop...

متن کامل

Optimize Document Identifier Assignment for Inverted Index Compression

Document identifier assignment is a technique for inverted file index compression, by reducing d-gap value of posting lists. It was approached by either TSP or clustering methods in existing study. However, there is no proper formulation for this problem and the existing approaches has no theory guarantee to be good approximations. In this paper, we first formulate document identifier assignmen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Computing Surveys

سال: 2021

ISSN: ['0360-0300', '1557-7341']

DOI: https://doi.org/10.1145/3415148